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Abstract 

We  present  a  model-based  technique  for  encoding  non- 
rigid  object  classes  in  terms  of  object  prototypes.  Objects 
from  the  same  class  can  be  parameterized  by  identifying 
shape  and  appearance  invariants  of  the  class  to  devise  low- 
level  representations.  The  approach  presented  here  creates 
a  flexible  model  for  an  object  class  from  a  set  of  prototypes. 
This  model  is  then  used  to  estimate  the  parameters  of  low- 
level  representation  of  novel  objects  as  combinations  of 
the  prototype  parameters.  Variations  in  the  object  shape 
are  modeled  as  non-rigid  deformations.  Appearance  vari¬ 
ations  are  modeled  as  intensity  variations.  In  the  training 
phase,  the  system  is  presented  with  several  example  pro¬ 
totype  images.  These  prototype  images  are  registered  to  a 
reference  image  by  a  finite  element-based  technique  called 
Active  Blobs.  The  deformations  of  the  finite  element  model 
to  register  a  prototype  image  with  the  reference  image  pro¬ 
vide  the  shape  description  or  shape  vector  for  the  proto¬ 
type.  The  shape  vector  for  each  prototype,  is  then  used 
to  warp  the  prototype  image  onto  the  reference  image  and 
obtain  the  corresponding  texture  vector.  The  prototype  tex¬ 
ture  vectors,  being  warped  onto  the  same  reference  image 
have  a  pixel  by  pixel  correspondence  with  each  other  and 
hence  are  ** shape  normalized*'.  Given  sufficient  number 
of  prototypes  that  exhibit  appropriate  in-class  variations, 
the  shape  and  the  texture  vectors  define  a  linear  prototype 
subspace  that  spans  the  object  class.  Each  prototype  is  a 
vector  in  this  subspace.  The  matching  phase  involves  the 
estimation  of  a  set  of  combination  parameters  for  synthe¬ 
sis  of  the  novel  object  by  combining  the  prototype  shape 
and  texture  vectors.  The  strengths  of  this  technique  lie  in 
the  combined  estimation  of  both  shape  and  appearance  pa¬ 
rameters.  This  is  in  contrast  with  the  previous  approaches 
where  shape  and  appearance  parameters  were  estimated 
separately. 

1  Introduction 

One  of  the  primary  goals  of  computer  vision  is  to  opti¬ 
mally  and  reliably  compute  object  descriptions  from  im¬ 
ages.  Such  descriptions  are  useM  for  various  purposes  like 
object  recognition  and  analysis.  Important  characteristics 
of  such  descriptions  are  that  they  should  be  easily  com¬ 
putable  and  unique.  Various  methods  have  been  devised  in 
the  past  that  recover  object  descriptions  from  images  for 
recognition.  A  survey  of  such  techniques  focussed  on  face 
recognition  methods  reported  by  Fromherz  and  others  [12] 
implies  that  most  recognition  systems  have  been  demon¬ 
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strated  on  datasets  taken  under  constrained  conditions  that 
tend  to  breakdown  under  varying  pose  and  lighting  condi¬ 
tions.  Statistical  methods,  which  have  the  potential  to  esti¬ 
mate  these  variations,  suffer  from  the  problem  of  requiring 
large  amounts  of  training  data[10].  We  intend  to  formu¬ 
late  a  technique  that  will  perform  reliably  with  such  vary¬ 
ing  conditions  and  at  the  same  time  will  estimate  a  flexible 
model  of  the  object  class  from  a  small  set  of  prototypes. 

A  common  strategy  employed  in  computer  vision  to  de¬ 
sign  effective  algoritluns  is  to  emulate  methods  of  reason¬ 
ing  believed  to  be  used  by  human  beings  to  perform  im¬ 
age  analysis.  The  following  subsection  summarizes  some 
of  the  findings  made  in  the  community  of  psychophysics. 
Based  on  these  observations,  we  lay  our  framework,  the 
objective  of  which  is  to  simulate  the  observations  by  using 
existing  computer  vision  techniques, 

1.1  Human  Visual  System 

Various  psychophysical  and  physiological  studies  [11,  40, 
23,  24,  4,  29]  have  indicated  that  the  human  visual  system 
uses  strategies  that  represent  objects  in  2D  rather  than  3D 
models.  Several  psychophysical  tests  have  been  conducted 
that  explore  different  aspects  of  the  problem  of  recognition 
and  representation  in  human  vision.  In  a  nutshell,  the  fol¬ 
lowing  conclusions  have  been  made  from  the  outcomes  of 
a  variety  of  psychophysical  tests[4,  23]: 

•  Multiple  views 

Evidence  jfrom  various  tests  indicate  that  hu¬ 
mans  encode  three  dimensional  objects  as  multi¬ 
ple  viewpoint-specific  representations  that  are  largely 
two-dimensiond. 

•  Normalization 

Various  stimulus  tests  have  indicated  that  subordinate 
level  recognition  is  achieved  by  employing  a  time 
consuming  normalization  process  to  match  objects 
seen  in  unfamiliar  viewpoints  to  familiar  stored  view¬ 
points. 

•  View  Interpolation 

Various  test  evidences  and  computational  simulations 
indicate  that  view  interpolation  offers  a  plausible  ex¬ 
planation  for  viewpoint  dependent  performance  of  hu¬ 
man  response  times  and  error  rates  for  recognition. 

These  results  imply  that  the  human  visual  system  proba¬ 
bly  uses  heuristics-based  techniques  which  rely  on  multi¬ 
ple  views  representation  in  terms  of  2D  rather  than  explicit 
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Figure  1:  Basic  Idea:  The  images  at  the  vertices  of  the  triangle 
represent  three  prototypes.  The  prototypes  can  be  registered  with 
each  other  by  warping  them  onto  the  average  image.  The  shape 
and  texture  of  each  prototype  can  then  be  combined  to  generate 
new  images.  Here  the  intermediate  images  are  represented  along 
the  edges  of  the  triangle.  For  each  intermediate  image,  the  con¬ 
tribution  of  the  adjacent  prototypes  is  more  than  that  of  the  one 
farther  away. 

3D  models  with  appropriate  depth  information.  This  view¬ 
point  is  further  supported  by  recent  advances  in  computa¬ 
tional  theories  based  on  view  interpolation[ll]. 

1.2  Basic  Idea 

The  abovementioned  results  strongly  indicate  that  the  hu¬ 
man  mind  stores  a  set  of  prototypes  with  significant  vari¬ 
ability  and  uses  their  combinations  to  recognize  an  object. 
This  motivates  us  to  design  algorithms  that  will  represent 
object  classes  in  terms  of  2D  prototype  images.  As  a  start¬ 
ing  point,  it  will  be  worthwhile  to  study  some  existing  com¬ 
puter  vision  models  and  extend  them  to  simulate  the  hu¬ 
man  visual  system.  Intuitively,  an  object  can  be  thought  of 
as  comprised  of  two  components:  shape  and  appearance^ 
Based  on  this  premise,  we  can  categorize  current  computer 
vision  models  broadly  into  two  classes,  namely  shape  mod¬ 
els,  that  account  for  the  object  morphism  and  appearance 
models,  that  account  for  the  object  physiognomy.  Objects 
from  the  same  class  can  be  defined  in  terms  of  small  vari¬ 
ations  from  an  average  object.  These  variations  are  de¬ 
fined  as  displacements  in  an  orthogonal  coordinate  system 
where  each  axis  represents  a  principal  direction  of  varia¬ 
tion,  learned  statistically  from  a  large  set  of  examples. 

Assuming  that  an  object  can  be  parameterized  in  terms 
of  its  shape  and  texture  features,  we  define  our  goal  as  fol¬ 
lows.  ‘‘Given  sufficient  number  of  good  prototypes  that  en¬ 
compass  appropriate  in-class  variations,  we  intend  to  build 

^The  notion  of  appearance  is  same  as  the  object  texture.  Both  terms 
will  be  used  interchangeably 


a  deformable  appearance  model  that  will  reliably  describe 
the  object  class  and  explore  the  feasibility  of  modeling  the 
object  space  by  a  linear  combinations  approach”.  The  pa¬ 
rameters  of  a  novel  object  will  be  expressed  as  a  linear 
combination  of  the  parameters  of  the  prototype  objects  in 
the  training  set.  Figure  1  gives  a  pictorial  description  of  the 
approach.  The  technique  for  linear  combinations  is  exhib¬ 
ited  for  three  prototype  images. 

1.2.1  Existence  of  coupling  between  shape  and  ap¬ 
pearance  parameters 

An  important  observation  made  from  most  of  the  existing 
shape  and  appearance  models  is  that  either  approaches  pa¬ 
rameterize  only  one  feature  and  ignores  the  other.  For  in¬ 
stance,  shape  models  account  for  shape  deformations  only 
and  ignore  texture  parameterization.  Their  objective  is  to 
estimate  the  parameters  of  a  warping  function  that  would 
warp  the  template  image  such  that  it  matches  the  novel  im¬ 
age  as  closely  as  possible.  Similarly,  appearance  models 
ignore  shape  parameterization  by  presuming  that  the  in¬ 
put  images  are  shape  normalized  or  have  a  dense  corre¬ 
spondence  defined  with  respect  to  each  other.  Principal 
directions  of  texture  variations  are  learned  from  the  ex¬ 
ample  images  using  eigen-based  techniques.  Some  recent 
approaches  proposed  by  Cootes[6]  and  Nastar[28]  have 
shown  that  shape  and  appearance  can  be  modeled  together 
by  employing  the  eigen-based  techniques  in  the  combined 
feature  space.  Although,  both  methods  have  been  shown  to 
work  effectively  for  various  object  classes,  it  cannot  be  in¬ 
ferred  that  the  underlying  linearity  assumption  of  the  com¬ 
bined  shape  and  texture  space  can  be  generalized  to  all  ob¬ 
ject  classes.  Hence  we  estimate  the  shape  and  the  texture 
parameters  using  non-linear  optimization  techniques  with¬ 
out  any  assumption  about  linearity.  This  issue  will  be  fur¬ 
ther  analyzed  during  the  preliminary  tests  of  our  system. 

In  Section  2  we  introduce  various  shape  and  appearance 
models  previously  reported  in  the  computer  vision  commu¬ 
nity.  This  section  also  discusses  some  of  the  recent  meth¬ 
ods  that  have  proposed  combined  shape  and  appearance 
parameterization.  The  mathematical  formulation  of  the  de¬ 
formable  model  is  presented  in  Section  3.  This  section  de¬ 
velops  the  combination  of  prototypes  paradigm  and  derives 
the  objective  function.  Section  4  describes  the  modeling  of 
non-rigid  deformable  objects  hy  finite  element  based  meth¬ 
ods.  The  section  gives  a  brief  description  of  Active  Blobs 
which  will  be  used  for  registration  of  objects.  Section  5 
describes  the  various  minimization  techniques  that  will  be 
used  for  registration  and  model  fitting  later.  Section  6  de¬ 
scribes  the  implementation  of  various  phases  of  the  system 
in  detail  and  finally  summarizes  the  algorithm.  Prelimi¬ 
nary  results  are  reported  in  Section  7.  This  is  followed  by 
the  discussion  of  issues  that  explain  the  observed  results  in 
Section  8.  Finally  we  conclude  with  discussion  for  future 
applications  of  the  methodology  in  Section  9. 
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2  Background  and  Previous  Work 

This  section  throws  light  on  some  of  the  deformable  shape 
and  view-based  representations  relevant  for  the  formula¬ 
tion  of  our  problem  and  then  explores  some  methods  that 
try  to  combine  those  representations  to  come  up  with  a 
model  that  makes  the  paradigm  of  linear  combinations  fea¬ 
sible. 

2.1  Shape  Models 

Initial  shape  representation  methods  concentrated  on  ways 
to  employ  flexible  models  by  constraining  the  solu¬ 
tion  space  of  allowed  deformations.  Kass,  Witkin  and 
Terzopoulos[20]  described  a  method  of  representing  ob¬ 
jects  in  images  as  active  contours  or  energy  minimiz¬ 
ing  splines  that  were  guided  by  external  constraint  forces 
and  influenced  by  image  forces  along  the  image  gradi¬ 
ents.  Cootes  proposed  the  Chord  Length  Distribution  or  the 
Point  Distribution  Model[5],  a  method  of  shape  represen¬ 
tation  that  estimates  the  chord-lengths  where  each  object  is 
represented  as  an  n- vertex  polygon.  The  n- vertex  polygons 
are  created  by  placing  points  over  the  object  boundaries  ei¬ 
ther  manually  or  semi-automatically.  Training  is  done  by 
estimating  the  covariances  of  chord  lengths  between  each 
pair  of  vertices  in  the  polygons  created.  Later  Baumberg 
and  Hogg  [2]  used  this  idea  to  track  contours  by  associat¬ 
ing  points  with  B-splines  rather  than  edges  of  a  polygon. 
This  method  is  dependent  on  the  initial  placement  of  the 
points  of  the  polygon  and  due  to  this  reason  models  only 
global  deformations. 

Sclaroff  and  Pentland  [32, 38]  had  proposed  a  method  of 
representing  objects  in  terms  of  modal  descriptions,  which 
is  based  on  the  idea  of  describing  objects  by  their  gen¬ 
eralized  symmetries,  as  defined  by  the  object's  deforma¬ 
tion  modes.  Unlike  the  point  distribution  model,  which 
statistically  modeled  object  shapes,  this  method  physically 
modeled  objects  by  determining  the  modes  of  free  vibra¬ 
tions  of  the  object.  The  modes  of  an  object  define  an 
orthogonal  object-centered  coordinate  system  where  each 
feature  point  can  be  uniquely  described  as  a  combina¬ 
tion  of  those  modes.  This  technique  has  been  used  to 
define  deformable  shape  prototypes  that  could  be  used 
for  shape-based  database  search[36]  and  track  deformable 
objects[37]. 

Cootes  and  Taylor[7]  later  combined  the  statistically 
based  point  distribution  model  and  the  physically  based 
nite  element  model  for  more  reliable  modeling  of  flexible 
objects.  While  the  point  distribution  model  accounted  for 
variation  across  object  shapes,  the  finite  element  model  ac¬ 
counted  for  the  vibrational  modes  of  a  single  object  shape. 
This  combined  formulation  was  shown  to  perform  better 
than  either  of  the  mentioned  methods  alone.  The  major 
drawback  of  this  system  is  the  requirement  of  large  train¬ 
ing  sets.  It  may  be  the  case  that  some  prototypes  are  more 
similar  and  hence  may  form  clusters  in  the  prototype  sub¬ 


space.  Absence  of  sufficient  training  data  will  then  bias  the 
system  to  interpret  all  the  prototypes  as  similar  to  the  pro¬ 
totypes  of  the  largest  cluster  in  the  training  data.  This  phe¬ 
nomenon  is  called  true-sbape  vulnerability[15].  The  point 
distribution  models  addressed  this  problem  by  using  an  hi¬ 
erarchical  representation  of  mean/reference  shapes  where 
at  each  level,  two  nearest  mean  shapes  were  combined  to 
obtain  the  mean  shape  in  the  next  higher  level.  The  hier¬ 
archy  was  built  in  a  bottom-up  fashion  starting  with  all  the 
prototypes  at  the  lowest  level. 

2.2  Appearance-based  Models 

Appearance-based  models  seek  to  obtain  a  compact  repre¬ 
sentation  for  intensity  distribution.  One  such  set  of  tech¬ 
niques  employ  eigen-based  methods  to  compress  an  image 
by  projecting  it  onto  a  low-dimensional  orthogonal  basis, 
the  eigenspace[43].  The  eigenspace  is  constructed  as  fol¬ 
lows.  Given  a  sufficient  number  of  prototype  images  ex¬ 
pressed  as  column  vectors  in  the  object  class  matrix,  we  use 
principal  components  analysis  (PCA)  or  Karhunen-Loeve 
expansion  to  estimate  the  intensity  distribution.  The  eigen¬ 
vectors  of  the  covariance  matrix  span  the  variations  across 
the  object  space  as  captured  by  the  prototypes.  The  eigen¬ 
vectors  corresponding  to  the  higher  eigenvalues  define  the 
directions  of  maximum  variations  and  hence  are  chosen 
to  represent  the  eigenspace.  Compact  representation  for  a 
new  image  is  obtained  by  just  projecting  ^e  image  onto  the 
eigenvectors.  [27, 26,  31, 43]  have  shown  the  effectiveness 
of  the  said  method  especially  for  visual  recognition.  The 
main  weakness  of  eigen-based  techniques  is  that  they  are 
not  robust  to  shape  variations  across  the  prototypes  as  they 
require  the  prototypes  to  be  registered  with  each  other. 

Covell[9]  has  proposed  a  similar  method  based  on  the 
principal  component  analysis  to  define  correspondences 
between  faces  in  terms  of  feature  point  locations.  Fea¬ 
ture  correspondences  so  obtained  were  then  used  for  mo¬ 
tion  estimation  and  morphing[8]  between  different  frames 
of  video.  This  is  an  interesting  idea  where  feature  points 
are  automatically  placed  at  correct  locations.  The  concept 
of  point  distribution  models  was  extended  to  model  inten¬ 
sity  distributions.  These  models  are  known  as  Appearance 
models[l\]  and  have  been  claimed  to  address  the  problem 
of  shape  normalization  which  was  not  addressed  in  eigen- 
faces.  This  method  requires  labeled  examples  for  training. 

2.3  Combined  Parametric  Models 

In  order  to  avoid  the  implicit  parameterization  of  shape  in 
appearance  models  and  make  shape  models  more  photo¬ 
realistic,  there  has  been  a  growing  interest  in  modeling 
both  shape  and  appearance  in  a  single  model  Nastar, 
Moghaddam  and  Pentland[28]  had  combined  physically 
based  modes  of  vibrations  with  statistically-based  modes 
of  variation  by  considering  each  point  in  the  image  as  a 
triplet  of  {x,y,I{x,y))  and  doing  manifold  matching  in 
this  XYI  space.  Although  this  method  combined  both  the 
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statistical  and  physical  modes  of  variation,  it  is  dependent 
on  good  initialization. 

Ullman  and  Basri[44]  have  showed  that  an  object  can 
be  represented  as  a  combination  of  2-D  images  where  the 
images  are  represented  in  terms  of  some  linear  transforma¬ 
tions  in  the  3-D  space.  However,  this  method  assumes  a 
linear  framework  for  object  deformations  and  handles  only 
limited  non-rigid  deformations. 

Poggio,  Jones  and  Vetter[17, 18, 19, 46]  have  suggested 
that  given  sufficient  number  of  prototypes,  the  parameter 
vectors  define  a  linear  space  and  span  the  model  space. 
Any  novel  object  can  then  be  expressed  as  some  combi¬ 
nation  of  those  prototype  vectors.  This  method  combines 
shape  or  geometry  witii  texture  or  appearance  in  a  way  that 
minimizes  both  shape  and  appearance  parameters  to  fit  the 
model.  This  is  a  robust  method  as  the  problem  of  model 
fitting  is  solved  as  a  global  non-linear  minimization  prob¬ 
lem. 

Cootes,  Edwards  and  Taylor[6]  have  also  suggested  a 
combined  formulation  of  their  appearance  and  active  shape 
models  to  develop  a  new  model  known  as  the  Active  Ap¬ 
pearance  Models,  This  method  does  PCA  in  both  the 
shape  and  the  texture  spaces  separately  and  then  combines 
them  and  again  does  PCA  to  remove  redundancies  between 
shape  and  texture  parameters.  All  objects  are  then  repre¬ 
sented  as  some  combination  in  this  orthogonal  model. 

3  Mathematical  Formulation 

Let  Ji,  J2, . . . ,  Jjv  be  the  iV  prototypes  available  for  train¬ 
ing  the  system.  Let  Iref  be  the  reference  image.  The 
objective  is  to  define  a  framework  whereby  all  the  proto¬ 
type  images  can  be  combined  to  generate  images  of  novel 
objects  from  the  same  class.  The  formulation  described 
here  is  similar  in  flavor  to  that  developed  by  Jones  and 
Poggio[17],  though  the  shape  deformations  are  determined 
by  finite  element  methods  as  opposed  to  optical  flow  meth¬ 
ods.  The  prototype  images  are  initially  not  in  correspon¬ 
dence  and  hence  cannot  be  combined.  This  emphasizes  the 
determination  of  pixel  to  pixel  correspondences  amongst 
the  prototype  images.  Let  5i ,  ^2 , . . . ,  5jv  be  a  set  of  shape 
parameters  such  that  each  Si  can  be  used  to  warp  the  ith 
prototype  image  onto  the  reference  image,  thereby  bring¬ 
ing  the  prototype  image  into  correspondence  with  the  ref¬ 
erence  image,  i.e. 

Siix,y)  =  ix,y)  (1) 

where  (f ,  y)  is  the  point  in  li  which  corresponds  to  {Xy  y) 
in  Iref  -  We  define, 

Ti{Xyy)^yV-^\liySi){Xyy)  (2) 

where  W  is  the  warping  function.  Thus,  for  each  proto¬ 
type  li  in  the  training  set,  we  obtain  a  shape  vector  Si  and 


a  inverse  warped  texture  vector  Ti.  Note  that  the  texture 
vectors  are  shape-free  as  all  of  the  prototype  images  are 
inverse  warped  onto  the  same  reference  prototype  image. 

Given  a  large  number  of  prototypes  which  appropriately 
vary  from  each  other  with  respect  to  different  characteris¬ 
tics  of  the  object  class,  we  can  define  a  set  of  parameters 
b  =  [61, 62?  •  •  •  j &iv]  and  c  =  [ci, C2, . . . ,  cjv]  such  that 
the  shape  and  the  texture  of  a  novel  object  Inovei  (not  in 
the  prototype  set)  can  be  derived  as  a  combination  of  the 
prototype  shape  and  texture  parameters. 

-  N 

Snovel  ~  ^  ^  CiSi  =  C  •  S  (3) 

i=l 

N 

Tnovel  ^  biTi  =  b  •  T  (4) 

Therefore,  the  equation  for  the  novel  image  can  be  defined 
as  follows: 

n’"^(/no«eJ,C.S)=b-T  (5) 

Hence  the  matching  phase  reduces  to  matching  the  the 
novel  image,  which  can  be  done  by  minimizing  the  sum 
of  squared  differences  (SSD)  error 

.  x,y 

(6) 

The  values  of  the  parameters  c  and  b  so  obtained,  provide 
a  compact  representation  of  the  novel  image  in  terms  of 
the  prototypes  in  the  training  set.  Since,  the  shape  and  the 
texture  vectors  of  the  prototypes  define  two  completely  dif¬ 
ferent  linear  subspaces  for  die  object  class  and  may  or  may 
not  be  independent  of  each  other,  an  important  caveat  in¬ 
volved  here  is  the  combined  estimation  of  both  the  shape 
and  texture  parameters.  Equation  6  is  the  basic  equation 
that  describes  the  mathematical  formulation  of  the  system. 
Further  constraints  may  be  employed,  depending  upon  the 
modeling  of  the  parameters. 

4  Deformable  Shape  Modeling 

Shape  modeling  in  the  system  is  done  by  using  Gnite  cle¬ 
ment  models  (FEM)[36,  38].  The  advantage  of  finite  ele¬ 
ment  models  is  their  ability  to  enforce  aprioh  constraints 
on  smoothness  and  amount  of  deformation,  which  in  gen¬ 
eral  is  not  possible  in  statistically  based  or  optical  flow 
based  methods.  FEM  is  a  numerical  approach  for  modal 
analysis.  Modal  analysis  is  a  method  for  identifying  the  in¬ 
herent  dynamic  characteristics  of  a  linear  system  in  terms 
of  its  natural  frequencies,  mode  shapes  and  damping  ra¬ 
tios.  The  underlying  idea  of  modal  analysis  is  to  represent 
the  vibration  responses  of  a  system  as  a  linear  combination 
of  a  set  of  simple  harmonic  motions[l].  This  idea  is  similar 
to  Fourier  analysis  of  waveforms. 
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FEM  can  be  used  for  describing  non-rigid  deformations 
of  an  elastic  body.  In  this  formulation,  an  object  is  mod¬ 
eled  as  a  sheet  of  rubber  which  can  freely  deform.  The 
surface  of  the  object  is  interpolated  by  Galerkin  method.  A 
set  of  polynomial  functions  are  defined  that  relate  the  dis¬ 
placement  of  a  single  point  to  the  relative  displacements  of 
other  points.  Hence  all  the  points  can  be  expressed  in  terms 
of  the  interpolation  functions  as  below: 

u(x)  =  H(x)U  (7) 

where  H  is  the  set  of  interpolation  functions,  x  is  a  vector 
of  all  the  data  points  and  U  is  the  vector  of  displacement 
components  at  each  feature  point.  The  strains  produced  at 
each  feature  point  due  to  the  displacement  are  obtained  as 
a  combination  of  the  element  strains  associated  with  the 
feature  points: 

e(x)  =  B(x)U  (8) 

where  B  is  the  strain  matrix  and  e  is  a  vector  of  strains 
produced  at  the  point  under  consideration.  The  problem  of 
modal  displacements  is  then  solved  as  a  dynamic  equilib¬ 
rium  equation: 

MU  -h  DU  -f  KU  =  R  (9) 

where  M,  D  and  K  are  the  mass,  damping  and  stiffness 
matrices  and  R  is  the  load  matrix.  The  reader  is  directed  to 
[38]  for  detailed  derivation  of  all  the  mentioned  matrices. 

The  non-rigid  deformations  are  then  expressed  in  an  or¬ 
thogonal  system  where  the  basis  is  defined  as  the  set  of 
orthonormalized  eigenvectors  of  M“^K.  Given  that  x  is 
the  set  of  all  feature  points,  the  locations  of  the  new  feature 
points  is  given  as  follows: 

m 

x'  =  X  -h  ^  (l)j  Uj  (10) 

i=l 

where  x  is  the  mean  displacement  position,  x'  is  the  de¬ 
formed,  position,  Uj  is  the  jth  mode  parameter  value  and 
is  the  jth  eigenvector  defining  the  jth  modal  displace¬ 
ment.  The  system  can  be  re-ortiiogonalized  to  separate  the 
affine  parameters  firom  the  modal  parameters, 

4.1  Active  Blobs 

We  ust  active  blobs[37]  to  register  the  prototype  images. 
Active  blobs  is  a  non-rigid  region  based  finite  element 
method  used  for  registration  and  tracking  of  deformable 
objects.  Initial  blob  of  a  reference  object  is  created  by  as¬ 
sociating  a  deformable  polygonal  mesh  with  the  object  tex¬ 
ture  map.  Registration  is  solved  as  an  energy  minimization 
problem  where  the  shape  parameters  (in  our  case,  the  finite 
element  modes)  are  estimated  so  that  difference  between 
the  warped  reference  object  and  the  novel  object  is  min¬ 
imized.  Minimization  is  done  by  least  squares  approach 
which  will  be  described  in  detail  in  next  section. 


Let  Iref  be  the  reference  image  fi-oiri  which  the  initial 
blob  is  created.  Given  a  novel  image  Ji,  the  goal  is  to  ob¬ 
tain  a  set  of  mode  parameters  that  will  transform  the  refer¬ 
ence  image  to  match  the  target  image  over  the  region  of  the 
blob.  The  method  works  by  estimating  the  change  in  the 
parameters  required  to  minimize  the  sum  of  squared  dif¬ 
ferences  error  between  the  current  estimate  and  the  target 
image.  In  the  implementation,  instead  of  warping  the  refer¬ 
ence  image  to  the  target  image,  the  target  image  is  inverse 
warped  to  the  reference  image.  The  energy  minimization 
problem  is  formulated  as  follows: 

^image  =  cf  (11) 

^  i=l 

Ci  =  \\Ii{^i}yi)’^Iref{^i^yi)\\  (12) 

where  fi{xi,yi)  is  the  intensity  of  the  pixel  at  loca¬ 
tion  {xi,yi)  in  the  inverse  warped  target  image  Ji  and 
Irefi^u  yi)  is  the  intensity  of  the  pixel  at  the  same  location 
in  ffie  reference  image.  The  adverse  effect  of  the  outliers 
that  tend  to  throw  the  mininodzation  process  out  of  track  are 
handled  by  using  a  robust  error  norm  which  is  a  Lorentzian 
influence  function  p,  given  as: 

1  ” 

^image  =  (13) 

i—1 

p{ei,  a)  =\og(l  +  -^)  (14) 

where  cr  is  an  optional  scale  parameter. 

5  Minimization  Techniques 

This  section  introduces  two  minimization  techniques 
namely,  difference  decomposition  and  Levenberg  Mar- 
quardt,  that  were  used  to  register  prototype  images  by  ac¬ 
tive  blobs  and  fit  the  model  to  novel  images.  While  the  for¬ 
mer  is  a  locally-linear  approach,  the  latter  is  a  non-linear 
approach  of  minimization.  Model  fitting  will  be  described 
in  the  next  section. 

5.1  Difference  Decomposition  Method 

Minimization  of  certain  linear  least  squares  problems  can 
be  accomplished  via  difference  decomposition[l3].  The 
method  works  by  estimating  derivatives  of  the  function 
along  each  parameter.  The  derivatives  are  estimated  by  tak¬ 
ing  difference  between  the  warped  reference  image  and  the 
original  reference  image.  The  warping  is  done  by  adding  a 
small  displacement  to  the  parameter  with  respect  to  which 
the  derivative  is  to  be  taken.  This  works  well  under  the 
assumption  that  the  function  to  be  minimized  is  locally  lin¬ 
ear.  Let  N.  =  {nol  ...|n7n}^  be  a  matrix  whose  rows  are  the 
displacement  vectors.  Let  B  =  be  the  matrix 

whose  rows  are  the  difference  templates  corresponding  to 
each  displacement  vector  obtained  as  mentioned  above: 

bk  =  Io-W-Hlo,nk)  (15) 
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W  is  the  warping  function.  The  intuition  behind  the  ap¬ 
proach  is  as  follows.  During  the  minimization  phase,  if 
the  difference  image  between  the  warped  reference  image 
and  the  target  image  is  similar  to  the  difference  template 
bk,  then  the  change  in  the  warping  parameters  required  to 
register  the  reference  image  with  the  target  image  is  given 
by  riA;.  Thus,  if  the  difference  image  can  be  expressed  as 
a  combination  of  the  difference  templates,  then  the  change 
in  the  parameters  required  is  given  by  the  combination  of 
the  displacement  vectors.  The  difference  patterns  can  be 
generated  from  the  reference  image,  and  hence  are  pre¬ 
computed.  Let  the  current  estimate  of  the  parameters  be 
q.  Then  the  difference  between  the  inverse  warped  target 
image  and  the  reference  image  is  given  as: 

D  =  Io-yV-\lo,q)  (16) 

By  the  linearity  assumption,  this  difference  image  D  can 
be  approximated  as  a  combination  of  the  difference  basis 
vectors: 

D  »  B'^k  (17) 

where  A:  is  a  vector  that  gives  the  combination  coefficients. 
Since  it  is  an  over-constrained  system,  the  solution  is  ob¬ 
tained  by  least-squares  approximation  whereby  k  is  ob¬ 
tained  as: 

(18) 

The  corresponding  change  in  the  warping  parameters  is 
computed  and  the  parameters  are  updated  as  below: 

Ag  =  Nk  (19) 

g'  =  g  -f  Ag  (20) 

5.2  Levenberg  Marquardt  Method 

Levenberg  Marquardt  method  is  a  non-linear  minimization 
technique  that  performs  more  reliably  than  difference  de¬ 
composition  as  it  does  not  make  any  assumptions  regarding 
linearity  of  the  function.  The  technique  uses  a  combination 
of  linear  and  non-linear  approaches  for  updating  parame¬ 
ters  during  each  iteration.  Smooth  switching  between  the 
two  approaches  is  accomplished  by  a  weighting  term  A. 
When  the  magnitude  of  A  is  low,  the  minimization  is  done 
in  a  linearized  fashion  by  Gauss-Newton  method  whereas 
higher  magnitude  of  A  forces  the  system  to  be  solved  in 
quadratic  fashion  by  using  Gradient  Descent  technique. 

The  mathematical  formulation  is  as  follows.  Given  an 
objective  function  J5,  the  parameters  of  which  are  q\  the 
goal  is  to  determine  an  instance  g  that  minimizes  the  value 
of  E.  This  is  achieved  iteratively  by  solving  the  following 
set  of  simultaneous  equations: 

(i?  +  A7)Ag  =  g  (21) 

g'  =  g  +  Ag  (22) 

where  i? ,  g  and  A  are  the  Hessian  matrix,  the  gradient  vec¬ 
tor  and  the  controlling  parameter  respectively.  The  gradi¬ 


ent  vector  and  the  Hessian  matrix  are  determined  as  fol¬ 
lows: 


dE 


_dEdE 
~  dqk  dqi 


(23) 

(24) 


The  cost  of  the  objective  function  is  determined  with  the 
updated  parameter  values  g'.  If  the  cost  has  decreased  as 
compared  to  its  previous  value  then  the  system  tends  to 
linear  minimization  by  scaling  down  A  by  a  factor  of  10. 
If  the  cost  has  increased  then  the  system  moves  towards 
quadratic  minimization  by  scaling  up  A  by  10.  In  the  for¬ 
mer  case,  the  parameters  are  updated  to  g',  whereas  in  the 
latter  case,  the  updated  parameter  vector  g'  is  discarded  and 
we  proceed  with  the  old  parameter  vector  g.  Higher  values 
of  A  restrict  parameter  displacement  in  the  error  space  and 
force  the  solution  to  move  along  the  steepest  gradient. 


5.3  Gaussian  Pyramids 

It  is  not  uncommon  to  find  situations  where  the  minimiza¬ 
tion  solution  gets  trapped  in  local  minima.  This  may  hap¬ 
pen  when  the  error  function  is  not  exactly  concave  or  the 
amount  of  change  allowed  in  the  parameters  do  not  move 
the  current  estimate  closer  to  the  global  minima.  As  a  result 
the  solution  gradually  drifts  into  a  local  trough  and  eventu¬ 
ally  gets  trapped  inside  there.  Such  problems  can  be  han¬ 
dled  reliably  by  using  a  multigrid  relaxation  approach[Al]. 
These  methods  work  by  taking  advantage  of  multiple  dis¬ 
cretizations  and  smoothing  of  a  continuous  problem  over 
a  range  of  resolution  levels.  Solution  to  a  minimization 
problem  requires  computations  proportional  to  the  spatial 
distance  between  the  current  estimate  and  the  actud  so¬ 
lution.  This  suggests  the  possibility  of  speedup  by  com¬ 
puting  the  solution  over  a  coarse  grid  and  then  enhance  it 
by  successively  refining  the  grid.  Pyramids  are  one  such 
multi-resolution  technique  used  in  image  processing[35]. 

The  pyramids  used  in  the  current  implementation  are 
called  octave  pyramids  as  at  each  level  the  image  is  halved 
in  each  dimension  and  subsampled.  Successive  reduction 
in  the  resolution  and  subsampling  results  in  the  loss  of 
high  frequency  components  in  the  original  image.  In  other 
words,  this  is  equivalent  to  filtering  the  image  through  low- 
pass  filters  whereby  the  image  is  blurred  by  Gaussian  ker¬ 
nels  at  each  level.  Thus  at  the  coarsest  level,  it  may  be  as¬ 
sumed  that  all  the  components  corresponding  to  the  local 
minima  are  smoothened  enough  to  be  determined  as  pos¬ 
sible  points  of  solution.  Hence  when  successive  solutions 
are  computed  from  the  coarsest  levels  and  propagated  to 
the  finer  levels,  the  solution  tends  towards  the  global  min¬ 
ima  and  eventually  it  may  be  expected  to  converge  to  the 
actual  global  minima. 
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Figures  3  and  4. 

6.2.1  Difference  Decomposition  Method 


Inputs: 

•  Prototypes:  Ii , /2,  •  •  • , In- 

•  Initial  reference  image:  Jre/- 

Outputs: 

•  Final  reference  blob:  Jre/. 

Steps: 

•  Iterate 

-  For  each  prototype  li  do 

*  Register  Iref  with  Ii. 

^  Si-  set  of  mode  values  required  to  reg¬ 
ister  Iref  with  Ii, 

*  Ti  =  W-^{IuSi), 

EndFor. 

“  Savg  =  ^ 

rp  ^  X  T 

-  J-avg  —  fq  Z>j=l  -^3' 

-  Create  new  reference  image: 

Iref  ~  ^^avg’i  Savg^- 

Iref  Iref’ 

•  Until  iterations  ^  n  and 

Ex,y  WKef  y)  ^ref  {x,  v)  ||  >  threshold. 


Figure  2:  Algorithm:  Average  Image  Computation 

6  Implementation 

6.1  Average  Image  Computation 

If  the  reference  image  happens  to  be  from  a  group  of  im¬ 
ages  that  are  clustered  together  in  the  prototype  space,  then 
the  results  would  be  biased  towards  the  prototypes  in  that 
cluster.  Hence  we  use  the  average  image  which  will  be 
fairly  equidistant  from  each  of  the  prototypes.  This  aver¬ 
age  image  is  computed  in  an  iterative  fashion.  We  start 
with  an  arbitrary  reference  image  Iref-  The  user  circles 
out  the  region  of  interest  from  which  a  blob  is  created.  The 
basic  steps  for  average  image  computation  have  been  enu¬ 
merated  in  Figure  2. 

6.2  Training  and  Matching  Phases 

Once  the  reference  image  has  been  computed,  the  system 
needs  to  be  trained  with  each  of  the  prototype  images.  This 
is  done  by  registering  each  image  with  the  reference  image. 
The  mode  values  are  stored  as  the  shape  vectors  and  the 
inverse  warped  prototype  images  are  stored  as  the  texture 
vectors.  The  reconstruction  of  a  novel  object  is  obtained  by 
minimizing  Equation  6.  The  computations  involved  in  the 
training  and  matching  phases  vary  according  to  the  mini¬ 
mization  method  used.  The  basic  steps  are  enumerated  in 


Difference  decomposition  method,  assumes  that  the  proto¬ 
type  space  is  linear.  The  method  works  by  precomputing 
the  estimates  of  the  first  derivatives  of  the  function  with 
respect  to  each  parameter  and  using  them  later  to  approx¬ 
imate  the  current  difference  between  the  estimate  and  the 
target  image  as  a  linear  combination  of  the  precomputed 
derivatives. 

Training  Phase 

Assuming  that  the  reference  image  is  very  close  to 
the  average  image,  we  can  use  difference  decomposition  to 
minimize  the  objective  function.  The  intuition  is  that  the 
difference  between  the  novel  image  and  the  average  image 
should  be  expressible  as  a  combination  of  the  differences 
between  the  average  image  and  the  prototype  images.  Let 
Savg  and  Tavg  be  the  average  mode  and  texture  vectors 
respectively.  Then  in  the  ideal  case 

N 

,  +  =  0  (25) 

i=l 

where  all  the  coefficients  Ci  and  bi  have  the  value  The 
second  term  in  the  equation  is  a  constraint  term  that  re¬ 
stricts  the  shape  parameters  to  sum  to  unity  in  order  to  ac¬ 
count  for  redundancies  due  to  the  inclusion  of  affine  pa¬ 
rameters  like  scale  and  rotation  etc.  in  the  modal  parame¬ 
ters.  In  the  average  case,  this  term  in  the  equation  is  equal 
to  zero.  The  difference  templates  can  be  computed  by  vary¬ 
ing  each  parameter  rii  by  a  value  5i  such  that  the  energy  of 
the  difference  images  due  to  each  Si  is  same  as  Ethresh> 
where  Ethresh  is  some  threshold.  For  each  energy  level  a 
positive  and  a  negative  Si  is  determined  by  the  bisection 
method.  Initially,  only  one  entry  corresponding  to  a  partic¬ 
ular  parameter  is  non-zero  and  all  others  are  zero.  For  a  sin¬ 
gle  energy  level  Ethresh  then  the  basis  displacement  ma- 
trix  looks  like  N  =  {no„„.  |no„,„  | . . . 
where  Ui^^^  and  rii^.^  are  the  positive  and  negative  dis¬ 
placement  vectors,  that  have  non-zero  entries  Si^^^  and 
^im»n  respectively  at  the  ith  entry  and  0  elsewhere.  These 
displacement  vectors  produce  difference  templates  B  = 
|6o„i„  I  •  •  •  \bmrr,,,  \bmmin  The  mattix  B  may  be 
ill-conditioned  due  to  redundancies  in  the  shape  and  the 
texture  vectors.  As  a  result  it  can’t  be  used  directly  for 
difference  decomposition  as  described  in  Section  5. 

In  order  to  remove  those  redundancies,  we  transform 
the  displacement  vectors  such  that  the  resulting  difference 
templates  would  be  near  orthogonal.  This  transformation 
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Inputs; 

•  Reference  blob:  Ire/. 

•  Difference  decomposition  basis  vectors: 

N  = 

Outputs: 

•  Prototype  texture  vectors: 

T  =  {Ti,T2,...,Tn}. 

•  Prototype  shape  vectors: 

S  =  {SuS2,...,Sn}. 

•  Difference  image  templates: 

B  =  {&!}  &25  •  •  •  j 

Steps; 

1.  For  each  prototype  li  do 

•  Si  =  set  of  mode  values  required  to  register 
Iref  with  li. 

.Ti  =  W-Hli,Si). 

EndFor, 

2.  S  =  {5i,S2,...,S'Ar}. 

3.  T  =  {ri,T2,...,7V}. 

4.  If  minimization  ==  Difference  decomposition 

•  Savg  =  ^  Ylj==l  ^3- 

•  Tavg  =  ^  12j=l 

•  For  each  rti  E  N  do 

”  rishape  =  {n-J  >  •  •  •  >  } . 

-  ntexture  =  ,  .  .  .  ,  nf ^}. 

—  W  ^{Tavg^  Savg){^^y)  “ 
W  ^(ntexture  ’  TjUghape  *  S)(^j2/)* 
EndFor. 

•  B  =  {bi,  &2>  •  ■  •  >  ^^m}* 

•  Save  T,  S  and  B. 

Endlf 

5.  If  minimization  ==  Levenberg  Marquardt  then 

•  SaveT,  S. 

EndK 

"required  only  for  Difference  Decomposition. 


is  obtained  by  using  singular  value  decomposition  (SVD) 
as  follows. 

B  =  U'BV'^  (26) 

N*  =  N{B'^Ut)  (27) 

where  U  and  V  are  left  and  right  singular  vectors  respec¬ 
tively  and  cr  is  the  matrix  of  singular  values  of  J5;  Ut  is 
the  truncated  left  singular  vectors'  matrix  [42,  3].  The  last 
equation  is  obtained  as  the  result  of  linearity  assumption. 
If  rank{B)  =  t  and  R{B)  and  N{B)  are  the  range  and 
null  spaces  of  B  then 

R{B)  =  span{ui,  •  •  • ,  ^t}  (28) 

N{B)  =  span{vt+i,...,Vn}  (29) 

N*  is  the  new  set  of  displacement  vectors  obtained.  Note 
that  N*  will  now  have  fewer  displacement  vectors  as  com¬ 
pared  to  N  because  of  truncation  and  will  not  be  sparse 
anymore.  New  difference  templates  B*  =  {6o*|  •  •  •  \bk*}'^ 
are  determined  corresponding  to  the  displacement  vectors 
in  iV*.  k  is  the  point  where  the  singular  values  were 
truncated.  This  new  B*  matrix  is  now  used  in  place  of  the 
B  matrix  in  difference  decomposition. 

Matching  Phase 

Let  the  mixing  parameters  for  the  shape  and  the 
texture  vectors  of  the  prototypes  be  represented  by 
c  =  [cij...,Civ]  and  b  =  [bi, . . .  ,fcjv].  Given  a  novel 
image  InoveU  the  mixing  parameters  q  =  [c|b]  are 
obtained  by  projecting  the  difference  image  D  onto  the 
difference  decomposition  basis,  where  D  is  obtained  as: 

D{x,y)  =  W“^(J„o«e!,c  •  S){x,y)  -  (b  •  T)(a:,i/)  (30) 

The  change  in  the  mixing  parameters  k  is  obtained  in  the 
least  squares  framework  as  follows: 

E  =  {B*'^k  -  Df  +  'y{W{q  +  N*k)  -  if  (31) 

where  the  second  term  is  the  constraint  term  expressed  in 
matrix  notation.  W  is  an  weighting  vector  which  contains 
I's  corresponding  to  the  shape  terms  and  O's  for  all  texture 
terms.  Differentiating,  both  sides  we  get 

^  =  2*[B*B*'^k-DB*'^ +  ^N*W'^Wq 


-  +  -iN*W'^WN*'^k]  (32) 

Rearranging  the  terms  and  evaluating  to  zero  we  get, 

k  =  {B*B*’^ +  ')N*W'^WN*'^]-\DB*'^ 

-'iN*W'^Wq  +  'iN*W'^]  (33) 

Ag  =  N*k  (34) 

Ag  =  [Ac|Ab]  (35) 


Figure  3:  Algorithm:  Training  Phase 
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The  mixing  parameters  are  then  updated  to  get  the  new 
mixing  parameters  and  this  process  is  iterated  until  they 
stop  changing  or  a  fixed  number  of  iterations  are  complete. 

q  =q  +  Aq  (36) 

The  final  reconstruction  has  the  shape  of  the  deformed  blob 
with  mode  values  obtained  as  combination  of  the  mode  val¬ 
ues  of  all  the  prototypes  as  determined  by  c  parameters  and 
the  texture  as  determined  by  the  b  parameters. 


6.2.2  Levenberg  Marquardt  Method 

The  training  phase  for  this  method  typically  involves  the 
registration  of  the  prototype  images  and  determination  of 
the  shape  and  the  texture  vectors  for  each  prototype.  No 
precomputations  are  required  for  this  method. 

Matching  Phase 

The  matching  phase  for  a  novel  image  involves  the 
minimization  of  the  error  function  given  above.  The 
-minimization  is  performed  by  computing  the  first  and 
second  derivatives  of  the  objective  function  and  equating 
them  according  to  the  first  approximation  principle  for 
derivatives.  For  simplicity,  we  use  forward  warping  in¬ 
stead  of  inverse  warping  for  this  method  of  minimization. 
Hence  the  objective  function  along  with  its  required 
derivatives  is  given  as  follows: 

I  Yli^novel  (X,  y)  -  Wih  •  T,  c  •  S){x,  y)f 

N 

(37) 

fc=l 

Y^[Inoveiix,y)  -  >V(b  ■  T,c  •  S){x,y)] 

®>2/ 

[-W(Tfc,c-S)(a:,j/)]  (38) 

J2[Inovel{x,  y)  -  W(b  •  T,C  •  S)(x,J/)] 

,«W(b-T,c-S),  „ 

1 - aj;— 

N 

+27(;^c;fe-l)  (39) 

k=l 


E{h,c)  = 


dE 

dh 

dE 

dCk 


"-(x,y)  =  [W(b-T,(c  + Ac)-S)(a:,3/) 

-  >^(b-T,c-S)(x,y)]/A  (40) 


Inputs: 

•  Prototype  shape  and  texture  vectors: 

S  =  {5'i,...,5iv}  andT  =  {Ti, . . .  jTjv}. 

•  Novel  Image:  Jnovei. 

•  Basis  vectors  and  difference  templates^: 

N  =  n^}  andB  =  {bi,...,bm}. 

Outputs: 


•  Combination  parameters:  c  and  b. 

•  Reconstructed  novel  image:  Ireconstructiou’ 

Steps: 

•  c  =  b  =  .  •  * ,  q  =  {c|b}. 

•  If  minimization  ==  Difference  decomposition 

-  Iterate 

*  Compute  D. 

*  g  =  g  +  A^. 

*  Compute  £?(c,b). 

-  Until  iterations  ^  n  and 

>  threshold, 

EndK 


•  If  minimization  ==  Levenberg  Marquardt 

-  Iterate 

*  Ag= 

*  g'  =  g  +  Ag. 

*  E  =  {q'^,...,q'^). 

*  h'  =  {q'^+\...,q'^% 

*  Compute  £/(c^b'). 

*  Jf  1|E|1  decreased  then 

•  q  =  q';  A  =  A/10, 
else 

•  A  =  A  *  10. 

Endlf 

-  Until  iterations  ^  n  and 
||jB||  >  threshold. 

Endlf 

•  c  =  {g^,...,g^};b  =  {g^+\...,g2^}. 

•  Ireconatruction  —  VV(b  •  X,  C  •  S). 

^required  only  for  Difference  Decomposition, 


Acfc  =  [0, . . . ,  A:  -  Itimes,  A,  0, . . . ,  0]  (41) 
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Figure  4;  Algorithm;  Matching  Phase 


The  second  derivatives  of  the  given  function  are  approxi¬ 
mated  as  below: 

dmidruj  drrii  drrij 

where  rrik  =  Ck  or  6^.  These  derivatives  so  obtained  are 
then  used  to  define  the  gradient  vector  and  the  Hessian  ma¬ 
trix  and  the  system  of  equations  are  solved  as  described  in 
Section  5.2.  The  parameter  vector  q  is  defined  as  a  com¬ 
posite  vector  of  the  shape  and  the  appearance  parameters: 

q  =  [c|b]  (43) 

This  process  is  iterated  until  the  final  error  magnitude 
drops  below  a  given  threshold  or  a  fixed  number  of  iter¬ 
ations  are  completed.  The  A  in  the  equation  above,  acts  as 
a  time- varying  control  parameter  that  forces  the  solution 
to  follow  the  steepest  gradient  in  order  to  converge  to  the 
minimum. 

7  Feasibility  Study 

The  system  was  implemented  with  both  the  strategies  de¬ 
scribed.  The  system  was  tested  with  two  types  of  datasets, 
namely  face  images  and  sequences  of  heart  images,  and 
was  tested  with  some  images  from  the  training  set  as  well 
as  some  novel  images  that  were  not  present  in  the  train¬ 
ing  set.  It  was  observed  that  the  implementation  worked 
well  with  Levenberg  Marquardt  method  whereas  it  failed 
to  produce  satisfactory  results  with  difference  decomposi¬ 
tion  method.  The  possible  explanations  for  the  observed 
behavior  and  further  extensions  are  discussed  in  Section  8. 

7.1  Test  Set  1:  Face  Images 


Figure  5:  Average  face  image 

The  code  for  registration  of  images  was  taken  from  Ac¬ 
tive  Blobs  which  is  available  on  the  internet^.  The  proto¬ 
type  set  comprises  of  random  face  images  drawn  from  the 
MIT  database  (see  Figure  6).  Several  novel  face  images, 
which  were  not  present  in  the  training  set,  were  tested.  All 
of  those  could  be  reconstructed  in  the  combination  of  pa¬ 
rameters  paradigm  described  earlier.  The  images  are  of 
dimension  128x128  pixels.  The  size  of  the  faces  inside 
the  images  was  typically  around  64x64  pixels.  The  imple¬ 
mentation  makes  extensive  use  of  the  graphics  hardware 
for  texture  mapping  and  bilinear  interpolation.  Currently 

2http://www.cs.bu.edu/groups/ivc/ 


Figure  6:  Prototype  face  images  (#prototypes  =  100) 
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the  reconstruction  of  a  novel  face  image  takes  around  8 
minutes  on  a  R5000  SGI  02,  180  MHz  machine.  Major¬ 
ity  of  time  is  spent  in  combining  the  prototype  images  at 
each  iteration  for  the  reconstruction  of  novel  image.  The 
texture  vectors  for  the  prototype  images  comprise  of  the 
whole  texture.  Significant  speedup  is  possible  by  dimen¬ 
sionality  reduction.  In  future,  we  intend  to  evaluate  the 
system  with  dimensionally  reduced  prototype  texture  vec¬ 
tors,  where  we  will  use  coefficients  obtained  by  projecting 
prototype  images  into  the  eigenspace  instead  of  textures. 
We  expect  the  performance  of  the  system  would  improve 
as  we  will  have  to  combine  less  number  of  eigen-images 
for  reconstruction.  We  can  further  reduce  the  computa¬ 
tion  time  by  doing  minimization  on  only  one  color  channel. 
Various  other  alternatives  are  also  currently  being  explored 
to  improve  the  computation  time  (see  section  8).  The  fol¬ 
lowing  paragraphs  explain  the  working  of  the  system  using 
Levenberg-Marquardt  method. 


Figure  7:  Reconstructed  novel  face  images.  Left:  input  novel 
image;  Middle:  average  image  registration;  Right:  reconstruction 

The  user  starts  with  the .  average  image  computation 
phase.  In  this  phase  the  user  circles  the  region  of  interest 
(in  the  current  experiment,  outlines  the  face  in  the  image) 


and  a  blob  is  created  from  the  encircled  region.  This  blob 
is  then  used  to  register  other  face  images.  For  each  pro¬ 
totype  image,  the  user  moves  the  blob  and  places  it  over 
the  face.  Then  the  blob  is  allowed  to  automatically  deform 
and  register  with  the  face  underneath  it.  Shape  and  texture 
parameters  are  recovered.  Since  all  the  face  images  in  the 
MIT  database  have  been  placed  such  that  the  positions  of 
their  eyes  coincide,  we  can  automate  this  process  by  plac¬ 
ing  the  blob  at  a  designated  position  in  the  image,  rather 
than  having  the  user  place  it.  Alternately,  we  can  use  a 
face  detector  for  initial  placement  of  the  blob,  but  then  it 
would  handle  only  face  images.  The  outcome  of  this  stage 
is  the  average  blob  which  is  then  saved  for  future  use. 

In  the  training  phase,  the  average  blob  is  loaded  and 
the  user  trains  the  system  by  registering  the  prototype  face 
images.  This  phase  can  also  be  automated  as  described  in 
the  previous  paragraph. 

The  matching  phase  involves  the  reconstruction  of  the 
novel  image  specified  by  the  deformable  model.  Final  re¬ 
sults  are  presented  in  the  form  of  combination  coefficients 
for  shape  and  texture  parameters.  We  propose  to  evaluate 
the  quality  and  reliability  of  reconstruction  in  terms  of  er¬ 
ror  residuals  and  correlation  soon. 


Figure  8:  Prototype  heart  images  (#prototypes  =  20) 


In  this  experiment,  we  used  a  test  set  comprising  of  100 
prototype  images  (Figure  6).  The  prototypes  comprised  of 
75  images  of  males  with  mustaches  and  25  images  of  fe¬ 
males.  Figure  7  displays  the  reconstructions  of  some  novel 
images.  The  main  points  that  were  tested  here  are:  (1)  the 
algorithm  is  able  to  reconstruct  novel  faces  by  combining 
prototype  faces;  (2)  the  algorithm  is  capable  of  handling 
significant  variations  (“gender”,  in  this  case);  (3)  the  linear 
combinations  paradigm  can  be  exploited  to  generate  new 
images,  that  have  not  been  seen  earlier  (male  images  with¬ 
out  mustaches,  in  this  case).  The  top  two  rows  in  Figure  7 
display  the  reconstructions  of  two  novel  images  of  females, 
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not  present  in  the  training  set.  The  last  three  rows  display 
the  reconstruction  of  male  images  without  any  mustaches. 
Note  that  the  training  set  did  not  contain  any  male  proto¬ 
type  images  without  mustaches.  The  algorithm  was  still 
able  to  use  the  linear  combinations  paradigm  to  appropri¬ 
ately  combine  male  and  female  prototype  images  to  recon¬ 
struct  those  novel  images. 

7.2  Test  Set  2:  Sequences  of  Heart  Images 

In  order  to  evaluate  the  generality  of  the  approach  de¬ 
scribed,  we  tested  the  system  with  images  other  than  faces. 
The  second  test  set  included  sequences  of  heart  pumping 
taken  from  the  MIT  heart  database.  Since  there  were  only 
38  images,  we  included  all  the  odd  numbered  images  into 
the  training  set  and  used  the  even  numbered  images  as 
novel  images.  The  images  used  for  training  are  given  in 
Figure  8.  The  average  image  for  this  sequence  of  images 
and  reconstruction  of  some  novel  images  are  given  in  Fig¬ 
ures  9  and  10.  The  estimated  shape  and  texture  parameters 
can  be  used  for  various  medical  applications. 


Figure  9:  Average  heart  image 


8  Discussion 

In  this  section,  we  analyze  the  various  observations  made 
during  the  preliminary  experiments. 

8.1  Difference  Decomposition  Method 

Despite  being  an  elegant  real  time  minimization  method, 
difference  decomposition  fails  to  produce  satisfactory  re¬ 
sults  in  the  current  framework.  The  reason  for  this  is  the 
breakdown  of  the  local  linearity  assumption.  In  general, 
for  applications  like  tracking  (as  in  Active  Blobs),  there 
is  an  underlying  assumption  that  the  motion  between  two 
consecutive  frames  follow  image  space  constancy  which 
forbids  extreme  changes  between  two  consecutive  frames. 
This  assumption  may  not  hold  in  the  current  methodology 
as  it  may  be  the  case  that  several  prototypes  are  not  similar 
to  the  average  image,  hence  the  energy  surface  cannot  be 
assumed  to  be  smooth.  Possible  ways  of  addressing  this 
problem  are  given  as  follows: 

•  Recompute  the  difference  images  at  each  iteration 
Although  this  may  help  in  driving  the  system  to  the 
actual  solution,  still  the  overheads  involved  pull  down 
the  very  importance  of  the  method  as  performance 
will  become  extremely  slow. 


Figure  10:  Reconstructed  novel  heart  images.  Left:  input  novel 
image;  Middle:  average  image  registration;  Right:  reconstruction 


•  Time-varying  Jacobian  matrix 

Hager  and  Belhunieur[14]  had  derived  a  method  sim¬ 
ilar  to  difference  decomposition,  where  the  Jacobian 
matrix  is  time- varying  rather  than  being  completely 
precomputed  as  proposed  by  Gleicher[13].  This  sug¬ 
gests  that  it  may  be  possible  to  separate  out  the  two 
varying  components  (shape  and  texture)  and  use  an 
iterative  technique  where  only  one  of  the  parameters 
is  modified  at  a  time  as  Hager  and  Belhumeur  do  by 
using  an  iteratively  reweighted  least  squares  method. 

•  Active  Appearance  Models 

Cootes,  Edwards  and  Taylor[6]  have  used  a  method 
similar  to  difference  decomposition.  However,  they 
control  the  parameter  update  step  by  appropriately 
scaling  the  amount  of  change  in  parameters  at  each 
iteration.  This  is  done  by  adding  the  following  steps 
after  Equation  20: 

1.  jfc  =  l. 

2.  Compute  the  energy  E  with  updated  parameters. 
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3.  If  magnitude  of  E  has  increased  from  its  previ¬ 
ous  value,  then 

4.  Set  k  =  fc/2. 

5.  Set  g' =  g  +  fc  *  Ag,  goto  1. 

6.  Else 

7.  Setg  =  g'. 

This  in  theory  restricts  the  system  from  diverging  by 
permitting  small  displacements  such  that  the  energy 
of  the  system  decreases  monotonically.  However,  this 
technique  is  highly  susceptible  to  get  trapped  in  the 
nearest  local  minima  in  the  solution  space  and  hence 
is  unsuitable  for  image  reconstruction.  Applications 
have  been  shown  for  tracking  where  a  rough  fit  is  suf¬ 
ficient  to  track  and  moreover,  it  is  possible  that  iter¬ 
ations  over  several  frames  may  pull  out  the  solution 
from  the  local  minima  and  drive  it  towards  the  global 
minima. 

8.2  Levenberg  Marquardt  Method 

Though  Levenberg  Marquardt,  being  a  non-linear 
quadratic  minimization  method  addressed  the  problem 
of  linearity,  it  is  plagued  by  large  computation  times 
required  for  the  Hessian  matrix.  The  major  bottleneck 
is  the  number  of  floating  point  multiplications  involved 
which  is  O(n^m^)  where  each  image  has  O(n^)  pixels 
and  there  are  0{m)  prototype  images.  Possible  ways  of 
improving  the  current  status  may  involve  the  following: 

•  Minimization  at  random  pixels 

Instead  of  computing  the  Hessian  matrix  over  all  the 
pixels  in  the  images,  we  can  choose  random  pixels  at 
each  iteration.  This,  in  theory,  tries  to  simulate  the 
stochastic  gradient  method[A2],  but  would  be  more 
efficient  as  it  would  converge  to  the  solution  faster 
by  taking  larger  step  sizes  (implicitly  controlled  by 
A)  at  each  iteration,  if  the  error  function  happens  to  be 
purely  quadratic. 

•  Number  of  prototypes  used 

The  contribution  of  the  number  of  prototype  images  is 
at  least  quadratic  with  respect  to  the  amount  of  com¬ 
putations  performed.  A  significant  improvement  may 
be  obtained  by  selecting  “good”  prototypes  and  ex¬ 
cluding  redundant  prototypes.  This  issue  is  addressed 
in  the  next  section. 

8.3  Prototype  Selection 

Currently,  a  set  of  prototypes  is  chosen  randomly  and  the 
training  is  done  on  this  set.  If  three  prototypes  are  thought 
to  lie  on  a  line  in  the  prototype  space,  then  any  number 
of  extra  prototypes  on  the  same  line  are  redundant  and 
hence  should  be  singled  out.  Though  different  statistical 
methods  like  fc-means  clustering,  hierarchical  clustering, 
Bayes  classifier  etc.  can  be  used,  a  normal  tradeoff  in¬ 
volved  is  that  typical  pattern  recognition  methods  require 


large  training  data  sets  which  are  diverse  enough  to  charac¬ 
terize  the  whole  object  class[10,  36].  Some  possible  tech¬ 
niques  for  manual  or  automatic  selection  of  prototypes  and 
their  drawbacks  are  as  follows: 

•  Manual  selection 

Human  beings,  may  not  be  good  judges  of  variations 
across  different  object  classes,  hence  virtually  make 
this  approach  impossible. 

•  Approximation  by  MDL 

An  useful  way  of  estimating  the  good  prototypes  is 
iht  minimum  description  length  method,  also  known 
as  MDL.  But  the  major  drawback  of  this  method  is  the 
implicit  requirement  of  trying  out  all  possible  combi¬ 
nations  of  the  prototypes  and  finally  select  a  subset 
that  was  best  in  spanning  the  whole  set  of  the  avail¬ 
able  prototypes. 

•  Unsupervised  techniques 

Use  an  unsupervised  technique  for  determining  simi¬ 
larity  between  various  prototypes  and  determine  a  hi¬ 
erarchical  grouping  structure.  One  such  technique  is 
agglomerative  hierarchical  clustering[\0,  16].  An¬ 
other  way  of  grouping  prototypes  in  an  hierarchical 
fashion  was  suggested  by  Hill  and  Taylor[15].  But 
this  method  does  not  suggest  a  way  of  removing  the 
redundant  prototypes,  which  may  be  possible  by  prun¬ 
ing. 

•  Naive  Approaches 

Create  a  similarity  metric  for  measuring  the  simi¬ 
larity  between  two  prototypes.  Initially  we  train  the 
system  with  all  the  prototypes  and  then  determine 
the  similarity  between  each  of  the  prototypes  in  the 
dataset.  We  then  drop  redundant  prototypes,  retaining 
only  one  representative  prototype  from  the  group  of 
prototypes  that  were  grouped  together  as  similar. 

Start  with  only  one  random  prototype  image  in  the 
training  set.  At  each  step  try  to  reconstruct  another 
prototype  image.  If  the  prototype  could  be  recon¬ 
structed  satisfactorily,  then  discard  it  else  include  it 
into  the  training  set  and  re-train  the  system. 

Currently  we  are  implementing  certain  heuristics  based 
clustering  techniques  that  will  enable  us  to  choose  “good” 
prototypes  in  future. 

8.4  Active  Appearance  Models 

The  prototype  spaces  can  be  efficiently  compressed  using 
PCA  in  both  the  shape  and  the  texture  space  separately.  In 
Active  Appearance  Models  (AAM),  the  dimensionality  of 
the  composite  shape  and  texture  is  further  reduced  by  PCA. 
This  assumes  that  composite  shape  and  texture  spaces  can 
be  modeled  as  a  linear  space.  Though  this  would  provide  a 
greater  computational  efficiency,  still  it  is  not  clear  how  the 
performance  of  the  system  would  be  affected  when  some 
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shape  information  is  dependent  on  the  texture  information 
or  vice  versa.  For  instance,  a  change  in  shade  in  a  region 
can  occur  due  to  local  shape  deformation  as  well  as  actual 
variation  in  the  object  appearance.  AAM  can  learn  this 
variation  only  in  the  texture  space  and  might  lose  informa¬ 
tion  associated  with  local  shape  deformation  whereas  the 
deformable  model  presented  would  learn  it  as  a  shape  de¬ 
formation  or  texture  variation  or  both  depending  upon  the 
variations  exhibited  by  the  set  of  prototypes. 

The  question  for  using  either  of  the  mentioned  ap¬ 
proaches  depends  upon  the  requirements  of  the  applica¬ 
tions  for  which  the  system  is  being  designed.  Both  meth¬ 
ods  have  been  proved  to  be  comparable  in  literature.  For 
example,  for  tracking  purposes  where  computational  speed 
is  a  big  factor,  AAM  is  more  appropriate.  For  applications 
like  video  compression  by  using  techniques  like  morph¬ 
ing  etc.  intermediate  sequences  may  easily  be  generated 
by  doing  straightforward  interpolation  of  the  mixing  pa¬ 
rameters.  Intuitively,  this  seems  to  more  closely  resemble 
human  perception[23].  In  AAM,  it  is  not  evident  whether 
simple  interpolation  would  provide  good  results. 

Lastly,  AAM  is  a  statistically  based  model.  Such  mod¬ 
els  can  easily  be  confused  and  may  not  perform  well  if  the 
prototypes  provided  have  large  variations  whereas  we  can 
still  expect  an  optimal  solution  in  our  formulation.  This 
emphasizes  another  aspect  that  the  our  model  will  require 
much  less  number  of  prototypes  for  training  as  compared 
to  AAM  which  will  need  much  more  prototypes  to  reliably 
learn  the  object  class. 

8.5  Invariants 

An  important  point  to  ponder  in  the  linear  combinations 
paradigm  is  that  given  slight  variability  in  the  same  novel 
image  conditions,  each  time  a  different  set  of  shape  and 
texture  coefficients  may  be  recovered.  For  instance,  un¬ 
der  different  illumination  conditions,  the  combination  of 
the  prototype  textures  would  change  as  a  different  group 
of  prototypes  would  become  important  to  express  the  ob¬ 
served  changes.  Similar  observations  can  be  made  when 
the  pose  of  the  object  changes.  This  variability  of  the  com¬ 
bination  parameters  will  render  the  model  unsuitable  for 
recognition  purposes.  Hence  a  set  of  possible  invariants 
need  to  be  derived  from  the  current  framework.  Recently  a 
method  called  quotient  image  has  been  proposed  by  Riklin- 
Raviv  and  Shashua[34],  which  provides  a  way  of  comput¬ 
ing  signature  images  for  different  face  images  under  vary¬ 
ing  illumination.  The  same  idea  can  easily  be  adapted  in 
our  model.  Once  the  novel  image  has  been  reconstructed, 
the  shape  parameters  can  be  used  to  warp  the  texture  to  a 
canonical  shape  and  then  the  estimated  texture  can  be  used 
to  obtain  a  signature  image. 

Given  that  the  objects  under  consideration  are  Lamber¬ 
tian,  the  appearance  can  be  defined  as: 

Ti{x,  y)  =  pi{x,  y)n(x,  y)"^  Sj  (44) 


where  Ti{x^  y),  p(x,  y)  and  n(x,  y)  are  the  intensity  value, 
surface  albedo  and  the  direction  of  the  normal  at  point 
(rr,  y)  respectively.  Sj  is  the  direction  of  point  light  source. 
The  signature  or  quotient  image  (Q)  can  be  derived  as  a 
normalization  of  the  albedo  as  follows: 


Q(x,y) 


p(x,y) 

Pavg(x,y) 


(45) 


Assuming  that  the  shape  parameters  can  be  determined  ac¬ 
curately,  when  warped  on  to  the  same  surface,  the  dot  prod¬ 
uct  of  the  light  source  and  the  surface  normal  would  be 
nearly  constant.  Also  assuming  that  the  prototype  images 
appropriately  span  the  varying  illumination  space,  the  sig¬ 
nature  image  in  the  flexible  model  can  be  derived  as  below: 


^signature  2/) 


(b-T)(x,y) 
Tavg  {x,  y) 


(46) 


where  b  is  the  set  of  estimated  texture  parameters  for  the 
novel  image. 

8.6  Extendibility 

The  framework  described  here  can  easily- be  extended  to 
handle  p^ameters  other  than  just  shape  and  texture.  Illu¬ 
mination  modeling  can  be  included  in  the  current  system 
by  creating  a  separate  basis  for  illumination  variation[30]. 

If  the  prototype  images  are  taken  appropriately  under 
different  illumination  conditions  (say,  with  frontal  light 
source),  the  illumination  can  be  modeled  separately.  Il¬ 
lumination  prototypes  can  be  obtained  by  t^ng  several 
images  of  the  average/reference  image  under  various  light¬ 
ing  condition.  These  illumination  prototypes  can  then  be 
used  to  create  illumination  basis  as  used  by  Hager  and 
Belhumeur[14].  The  new  objective  function  is  given  as  be¬ 
low: 

E(c,b,^)  =  i  J2^W{Inovei,c  ■  S)ix,y)  -  (b  •  T)(a;,y) 

x^y 

N 

-  (^.B)(a:,2/)f +7(53  Cfc- 1)2  (47) 

k=l 

where  B  and  (3  are  the  set  of  illumination  bases  and  the 
combining  parameters  for  those  bases  respectively. 


9  Applications 

The  idea  explored  in  this  paper  is  central  to  the  several  im¬ 
age  analysis  problems.  Some  of  the  possible  applications 
are  as  follows: 


•  Image  Registration: 

This  model  can  be  used  for  registering  deformable 
models  by  reconstructing  the  shape  and  the  texture 
parameters  from  a  set  of  prototypes.  The  mixing  pa¬ 
rameters  obtained  can  be  used  for  clustering  similar 
objects  in  a  shape+appearance  database. 
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•  Image  Analysis: 

Apart  from  being  able  to  model  deformable  ob¬ 
jects,  this  model  has  the  added  benefit  of  the  in¬ 
formation  contained  in  the  texture.  After  registra¬ 
tion/reconstruction  the  texture  information  can  be 
used  for  different  analyses. 

Facial  expression  recognition: 

After  registering  a  face,  the  texture  information  can 
be  used  to  classify  the  expression  on  a  person’s  face. 
Given  that  the  object  space  is  linear,  if  >ve  include  im¬ 
ages  of  same  person  with  different  expressions,  then 
sufficient  number  of  prototype  images  should  be  able 
to  define  clusters  in  the  prototype  space  where  proto¬ 
types  with  similar  expressions  get  grouped  together. 
The  novel  image  would  then  be  nearer  to  a  particular 
group  than  other  prototype  groups  giving  us  informa¬ 
tion  about  expression. 

MR  Image  registration  and  analysis: 

Reconstruction  parameters  in  MR  images  registration 
might  give  some  information  whether  some  organ  is 
normal  in  appearance  or  shape.  Such  an  application 
was  reported  in  [25]  which  relies  on  modal  shape  de¬ 
formations.  Use  of  texture  information  might  help  in 
determining  inflammation,  bleeding  etc. 

Image  Compression: 

The  above  mentioned  methods  may  be  extended  to 
do  analysis  over  time.  Studying  patterns  of  move¬ 
ment  between  different  prototypes  may  provide  in¬ 
formation  about  the  kind  of  action  occurring.  Such 
methods  can  be  used  for  video  compression.  Such 
a  technique  for  animation  using  example  image  se¬ 
quences  has  been  reported  in  the  past  by  Poggio  and 
Brunelli[33]. 

•  Morphing: 

Given  that  the  model  parameters  have  been  estimated, 
it  would  be  interesting  to  see  how  different  views  can 
be  generated.  It  has  been  shown  in  the  past  the  novel 
views  of  an  object  can  be  obtained  by  warping  the 
reference  image  to  different  prototypes  [45].  This  idea 
can  be  used  to  extend  the  work  done  in  [39]  to  include 
appearance  along  with  physical  deformations. 

The  deformable  model  can  also  be  used  for  morph¬ 
ing  between  multiple  images  as  a  graphics  applica¬ 
tion.  This  model  can  be  used  to  improve  the  work 
reported  in  [22]  and  may  be  extended  for  video  com¬ 
pression. 

10  Summary 

In  this  paper,  we  explored  a  model-based  linear  combina¬ 
tions  approach  for  modeling  objects.  The  methodology, 
implementation  status,  results  obtained  so  far  and  possible 
explanations  of  various  observed  behavior  were  described 
in  detail  .  Apart  from  these,  the  method  was  compared 
with  existing  active  appearance  model  and  the  pros  and 


cons  were  brought  out.  Also  various  ways  of  extending 
the  existing  framework  have  also  been  described. 
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